Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available June 16, 2026
-
Titolo, Laura (Ed.)Many recent computational accelerators provide non-standard (e.g., reduced precision) arithmetic operations to enhance performance for floating-point matrix multiplication. Unfortunately, the properties of these accelerators are not widely understood and lack sufficient descriptions of their behavior. This makes it difficult for tool builders beyond the original vendor to target or simulate the hardware correctly, or for algorithm designers to be confident in their code. To address these gaps, prior studies have probed the behavior of these units with manually crafted tests. Such tests are cumbersome to design, and adapting them as the accelerators evolve requires repeated manual effort. We present a formal model for the tensor cores of NVIDIA’s Volta, Turing, and Ampere GPUs. We identify specific properties—rounding mode, precision, and accumulation order—that drive these cores’ behavior. We formalize these properties and then use the formalization to automatically generate discriminating inputs that illustrate differences among machines. Our results confirm many of the findings of previous tensor core studies, but also identify subtle disagreements. In particular, NVIDIA’s machines do not, as previously reported, use round-to-zero for accumulation, and their 5-term accumulator requires 3 extra carry-out bits for full accuracy. Using our formal model, we analyze two existing algorithms that use half-precision tensor cores to accelerate single-precision multiplication with error correction. Our analysis reveals that the newer algorithm, designed to be more accurate than the first, is actually less accurate for certain inputs.more » « lessFree, publicly-accessible full text available June 12, 2026
-
Melquiond, Guillaume; Tang, Ping_Tak_Peter (Ed.)Theorem proving demonstrates promising potential for verifying problems beyond the capabilities of SMT-solver-based verification tools. We explore and showcase the capability of Lean, an increasingly popular theorem-proving tool, in deriving the error bounds of table-based Logarithmic Number Systems (LNS). LNS reduces the number of bits needed to represent a high dynamic range of real numbers with finite precision and efficiently performs multiplication and division. However, in LNS, addition and subtraction become non-linear functions that must be approximated—typically using precomputed look-up tables. We provide the first rigorous analysis of LNS that covers first-order Taylor approximation, cotransformation techniques inspired by European Logarithmic Microprocessor, and the errors introduced by fixed-point arithmetic involved in LNS implementations. By analyzing all error sources and deriving symbolic error bounds for each, then accumulating these to obtain the final error bound, we prove the correctness of these bounds using Lean and its Mathlib library. We empirically validate our analysis using an exhaustive Python implementation, demonstrating that our analytical interpolation bounds are tight, and our analytical cotransformation bounds overestimate between one and two bits.more » « lessFree, publicly-accessible full text available May 5, 2026
-
Free, publicly-accessible full text available January 1, 2026
-
Free, publicly-accessible full text available November 17, 2025
-
Error-bounded lossy compression has been a critical technique to significantly reduce the sheer amounts of simulation datasets for high-performance computing (HPC) scientific applications while effectively controlling the data distortion based on user-specified error bound. In many real-world use cases, users must perform computational operations on the compressed data. However, none of the existing error-bounded lossy compressors support operations, inevitably resulting in undesired decompression costs. In this paper, we propose a novel error-bounded lossy compressor (called SZOps), which supports not only error-bounding features but efficient computations (including negation, scalar addition, scalar multiplication, mean, variance, etc.) on the compressed data without the complete decompression step, which is the first attempt to the best of our knowledge. We develop several optimization strategies to maximize the overall compression ratio and execution performance. We evaluate SZOps compared to other state-of-the-art lossy compressors based on multiple real-world scientific application datasets.more » « lessFree, publicly-accessible full text available November 17, 2025
-
Abstract Recent demand for distributed software had led to a surge in popularity in actor‐based frameworks. However, even with the stylized message passing model of actors, writing correct distributed software is still difficult. We present our work on linearizability checking in DS2, an integrated framework for specifying, synthesizing, and testing distributed actor systems. The key insight of our approach is that often subcomponents of distributed actor systems represent common algorithms or data structures (e.g., a distributed hash table or tree) that can be validated against a simple sequential model of the system. This makes it easy for developers to validate their concurrent actor systems without complex specifications. DS2 automatically explores the concurrent schedules that system could arrive at, and it compares observed output of the system to ensure it is equivalent to what the sequential implementation could have produced. We describe DS2's linearizability checking and test it on several concurrent replication algorithms from the literature. We explore in detail how different algorithms for enumerating the model schedule space fare in finding bugs in actor systems, and we present our own refinements on algorithms for exploring actor system schedules that we show are effective in finding bugs.more » « less
-
Abstract The oceanographic conditions of the Southern California Bight (SCB) dictate the distribution and abundance of prey resources and therefore the presence of mobile predators, such as goose‐beaked whales (Ziphius cavirostris). Goose‐beaked whales are deep‐diving odontocetes that spend a majority of their time foraging at depth. Due to their cryptic behavior, little is known about how they respond to seasonal and interannual changes in their environment. This study utilizes passive acoustic data recorded from two sites within the SCB to explore the oceanographic conditions that goose‐beaked whales appear to favor. Utilizing optimum multiparameter analysis, modeled temperature and salinity data are used to identify and quantify these source waters: Pacific Subarctic Upper Water (PSUW), Pacific Equatorial Water (PEW), and Eastern North Pacific Central Water (ENPCW). The interannual and seasonal variability in goose‐beaked whale presence was related to the variability in El Niño Southern Oscillation events and the fraction and vertical distribution of the three source waters. Goose‐beaked whale acoustic presence was highest during the winter and spring and decreased during the late summer and early fall. These seasonal increases occurred at times of increased fractions of PEW in the California Undercurrent and decreased fractions of ENPCW in surface waters. Interannual increases in goose‐beaked whale presence occurred during El Niño events. These results establish a baseline understanding of the oceanographic characteristics that correlate with goose‐beaked whale presence in the SCB. Furthering our knowledge of this elusive species is key to understanding how anthropogenic activities impact goose‐beaked whales.more » « less
An official website of the United States government
